# Lightweight Inference

Deepseek Ai DeepSeek R1 Distill Qwen 14B GGUF
DeepSeek-R1-Distill-Qwen-14B is an optimized large language model with a parameter scale of 14B, released by DeepSeek AI. It is distilled from the Qwen architecture and offers multiple GGUF quantization versions to improve performance.
Large Language Model
D
featherless-ai-quants
237
1
Magma 8B GGUF
MIT
Magma-8B is an image-text-to-text conversion model based on the GGUF format, suitable for multimodal task processing.
Image-to-Text
M
Mungert
545
1
Qwen3 1.7B GGUF
Apache-2.0
Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a range of dense and mixture of experts (MoE) models. Based on large-scale training, Qwen3 has achieved breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.
Large Language Model English
Q
prithivMLmods
357
1
GLM 4 9B 0414 GGUF
MIT
GLM-4-9B-0414 is a lightweight member of the GLM family with 9 billion parameters, excelling in mathematical reasoning and general tasks, providing an efficient solution for resource-constrained scenarios.
Large Language Model Supports Multiple Languages
G
unsloth
4,291
9
Qwen3 8B Q4 K M GGUF
Apache-2.0
This is the GGUF format version of the Qwen3-8B model, suitable for the llama.cpp framework and supports text generation tasks.
Large Language Model Transformers
Q
ufoym
342
3
Gemma 2 9b It Abliterated GGUF
A quantized version based on Gemma 2.9B, optimized using llama.cpp, suitable for running in LM Studio.
Large Language Model English
G
bartowski
3,941
37
Phi 4 Mini Instruct.gguf
MIT
Phi-4-mini-instruct is a lightweight open-source model focused on high-quality, reasoning-rich data, supporting a context length of 128K tokens.
Large Language Model Other
P
Mungert
13.08k
25
3b Zh Ft Research Release Q8 0 GGUF
Apache-2.0
This model is converted from canopylabs/3b-zh-ft-research_release into GGUF format, suitable for Chinese text generation tasks.
Large Language Model Chinese
3
cludyw
20
0
Google Gemma 3 1b It Qat GGUF
Multiple quantized versions based on Google Gemma 3B QAT weights, suitable for local inference deployment
Large Language Model
G
bartowski
1,437
2
Google Gemma 3 12b It Qat GGUF
Gemma-3-12b model based on Google QAT (Quantization-Aware Training) weight quantization, offering multiple quantized versions to accommodate different hardware requirements.
Large Language Model
G
bartowski
10.78k
16
GLM 4 9B 0414
MIT
GLM-4-9B-0414 is a lightweight member of the GLM family with 9 billion parameters, demonstrating outstanding capabilities in mathematical reasoning and general tasks, ranking among the top in open-source models of similar scale.
Large Language Model Transformers Supports Multiple Languages
G
THUDM
6,856
55
Orpheus 3b 0.1 Ft Q8 0 GGUF
Apache-2.0
This model is converted from canopylabs/orpheus-3b-0.1-ft into GGUF format, suitable for text generation tasks.
Large Language Model English
O
dodgeinmedia
22
0
Orpheus 3b 0.1 Ft Q2 K.gguf
Apache-2.0
This model is a GGUF format conversion of canopylabs/orpheus-3b-0.1-ft, suitable for text generation tasks.
Large Language Model English
O
athenasaurav
25
0
Orpheus 3b 0.1 Ft Q4 K M GGUF
Apache-2.0
This model is a GGUF-format conversion of canopylabs/orpheus-3b-0.1-ft, suitable for text generation tasks.
Large Language Model English
O
athenasaurav
162
0
Deepseek V3 5layer
A simplified 5-layer version of DeepSeek-V3 for lightweight tasks and rapid experimentation.
Large Language Model Transformers
D
chwan
30.01k
1
Arrowmint Gemma3 4B YUKI V0.1
A Japanese language model optimized for AI Virtual YouTuber (AI VTuber) conversations, developed based on Google's gemma-3-4b-it
Large Language Model Supports Multiple Languages
A
DataPilot
73
6
Orpheus 3b 0.1 Ft Q4 K M GGUF
Apache-2.0
GGUF quantized version of Orpheus-3B-0.1-FT, suitable for efficient inference
Large Language Model English
O
freddyaboulton
30
1
Gemma 3 4b It GGUF
This model is converted from google/gemma-3-4b-it to GGUF format via llama.cpp, suitable for local deployment and inference.
Large Language Model
G
ysn-rfd
62
1
Bge Reranker V2 M3 Q5 K M GGUF
Apache-2.0
This model is converted from BAAI/bge-reranker-v2-m3 into GGUF format using llama.cpp via ggml.ai's GGUF-my-repo space, primarily for text classification tasks.
Text Embedding Other
B
pyarn
31
1
Orpheus 3b 0.1 Ft Q2 K GGUF
Apache-2.0
This is a GGUF format model converted from the canopylabs/orpheus-3b-0.1-ft model, suitable for text generation tasks.
Large Language Model English
O
Zetaphor
67
1
Phi 4 Mini Instruct Abliterated
MIT
Phi-4-mini-instruct is a lightweight open-source model built on synthetic data and curated public websites, focusing on high-quality data with strong reasoning capabilities. It supports a 128K token context length and is enhanced through supervised fine-tuning and direct preference optimization to ensure precise instruction following and safety.
Large Language Model Transformers Supports Multiple Languages
P
lunahr
250
8
Phi 4 Mini Instruct
MIT
Phi-4-mini-instruct is a lightweight open-source model built on synthetic data and filtered public web data, focusing on high-quality, reasoning-rich data. It supports a 128K token context length and multilingual processing.
Large Language Model Transformers Supports Multiple Languages
P
microsoft
346.30k
455
Mistral Small 24B Instruct 2501 GGUF
GGUF quantized version of Mistral-Small-24B-Instruct-2501, suitable for local deployment and text generation tasks.
Large Language Model
M
MaziyarPanahi
474.73k
2
Rank Zephyr 7b V1 Full GGUF
MIT
This is the GGUF quantized version of the castorini/rank_zephyr_7b_v1_full model, designed for text ranking tasks.
Large Language Model English
R
tensorblock
66
0
Llama 3.2 3B Instruct Abliterated GGUF
MIT
An optimized quantized model where output and embedding tensors use f16 format, while other tensors use q5_k or q6_k format, resulting in a smaller size with performance comparable to pure f16.
Large Language Model English
L
ZeroWw
20
2
T5 Large Q4 K M GGUF
Apache-2.0
This model is a GGUF-converted version of google-t5/t5-large, supporting tasks like summarization and translation, and is applicable to multiple languages including English, French, Romanian, German, and more.
Large Language Model Supports Multiple Languages
T
tianlp
16
0
3danimationdiffusion V10 GGUF
Openrail
A 3D animation-style text-to-image generation model based on Stable Diffusion technology, supporting Disney and anime-style 3D image generation.
Image Generation English
3
second-state
182
5
Phi 3.5 Mini Instruct Uncensored GGUF
Apache-2.0
Phi-3.5-mini-instruct_Uncensored is a quantized language model suitable for use under various hardware conditions.
Large Language Model
P
bartowski
1,953
42
Stable Diffusion V1 5 GGUF
Openrail
Stable Diffusion v1.5 is a text-to-image generation model capable of producing high-quality images based on textual descriptions.
Image Generation
S
second-state
12.24k
11
Phi 3 Vision 128k Instruct
MIT
Phi-3-Vision-128K-Instruct is a lightweight, cutting-edge open multimodal model supporting a 128K token context length, focusing on high-quality reasoning in text and visual domains.
Image-to-Text Transformers Other
P
microsoft
25.19k
958
Phi 3 Small 8k Instruct
MIT
Phi-3-Small-8K-Instruct is a 7B-parameter lightweight open-source model focused on high-quality reasoning capabilities, supporting 8K context length, suitable for commercial and research applications in English environments.
Large Language Model Transformers Other
P
microsoft
22.92k
165
Phi 3 Medium 4k Instruct
MIT
Phi-3-Medium-4K-Instruct is a 14-billion-parameter lightweight open-source model focusing on high-quality reasoning capabilities, supporting 4K context length, suitable for commercial and research purposes in English environments.
Large Language Model Transformers Other
P
microsoft
43.60k
219
Vecteus V1 Gguf
Apache-2.0
GGUF format version of Vecteus-v1, supporting English and Japanese text generation
Large Language Model Supports Multiple Languages
V
Local-Novel-LLM-project
588
8
Phi 3 Mini 4k Instruct GGUF
MIT
Phi-3-Mini-4K-Instruct is a 3.8 billion parameter lightweight cutting-edge open-source model trained on the Phi-3 dataset, emphasizing high quality and inference-intensive characteristics.
Large Language Model
P
brittlewis12
170
1
Phi 3 Mini 4k Instruct Gguf
MIT
Phi-3-Mini-4K-Instruct is a lightweight, cutting-edge open-source model with 3.8 billion parameters, focusing on high quality and inference-intensive features, suitable for commercial and research use in English.
Large Language Model Supports Multiple Languages
P
microsoft
20.51k
488
Phi 3 Mini 128k Instruct
MIT
Phi-3 Mini 128K Instruct is a 3.8B parameter lightweight open-source model focused on reasoning capabilities, supporting 128K context length.
Large Language Model Transformers Supports Multiple Languages
P
microsoft
399.68k
1,638
Phi 3 Mini 4k Instruct
MIT
Phi-3 Mini-4K-Instruct is a lightweight, state-of-the-art open-source model with 3.8 billion parameters, specifically emphasizing high quality and dense reasoning capabilities.
Large Language Model Transformers Supports Multiple Languages
P
microsoft
685.17k
1,176
Slimplm Query Rewriting
A lightweight language model for query rewriting, capable of parsing user input into structured formats to optimize retrieval effectiveness.
Large Language Model Transformers
S
zstanjj
53
9
Phixtral 2x2 8
MIT
phixtral-2x2_8 is the first Mixture of Experts (MoE) model built upon two microsoft/phi-2 models, outperforming each individual expert model.
Large Language Model Transformers Supports Multiple Languages
P
mlabonne
178
148
Nekomata 14b Instruction Gguf
Other
This model is the GGUF version of rinna/nekomata-14b-instruction, compatible with llama.cpp for lightweight inference.
Large Language Model Supports Multiple Languages
N
rinna
89
11
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase